Avoid retaining saved tensors in fused norm custom ops under no_grad by LeSingh1 · Pull Request #2012 · NVIDIA/apex

LeSingh1 · 2026-05-31T23:18:51Z

Problem

FusedRMSNorm (and the sibling fused layer/RMS norm custom ops) leak two CUDA tensors per forward call under torch.no_grad(), as reported in #1999. On the torch.library.custom_op path (PyTorch >= 2.4) the setup_context functions unconditionally call save_for_backward; those saved tensors are retained in autograd metadata that is not released after a no_grad forward, leaking the saved activation + invvar each call.

Fix

In each affected setup_context (apex/normalization/fused_layer_norm.py), assign the scalar ctx fields first, then return early when torch.is_grad_enabled() is False, skipping save_for_backward. Backward can never run under no_grad, so nothing is lost, and the grad-enabled training path is unchanged.

Testing / verification status

Runtime-unverified — this was developed on a machine without a CUDA GPU, so the leak reproducer was not executed. The change compiles (py_compile) and passes ruff, and the reasoning is that skipping the save under no_grad removes the retained references regardless of the exact internal mechanism. I'd appreciate a maintainer with a GPU running the issue's count_cuda_tensors reproducer to confirm delta == 0 before merge; happy to adjust if the leak originates elsewhere.

Developed with AI assistance.

Addresses #1999

The custom-op forward path for FusedRMSNorm/FusedLayerNorm registers an autograd setup_context that unconditionally calls save_for_backward. For torch.library custom ops these saved tensors are retained in autograd metadata that is not released after the call returns, so each forward under torch.no_grad() leaks the saved activation and the invvar tensor (two CUDA tensors per call), accumulating linearly in long-running inference (issue NVIDIA#1999). Skip the save_for_backward calls when grad is disabled, since backward can never run in that case. The grad-enabled training path is unchanged. Signed-off-by: LeSingh1 <sshaurya914@gmail.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Avoid retaining saved tensors in fused norm custom ops under no_grad#2012

Avoid retaining saved tensors in fused norm custom ops under no_grad#2012
LeSingh1 wants to merge 1 commit into
NVIDIA:masterfrom
LeSingh1:fix-1999-fused-rmsnorm-nograd-leak

LeSingh1 commented May 31, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

LeSingh1 commented May 31, 2026

Problem

Fix

Testing / verification status

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant